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ABSTRACT 

Aims. Our goal is to develop a new and reliable statistical method to classify galaxies from large surveys. We probe the reliability of the method 
by comparing it with a three-dimensional classification cube (Mignoli et al. 2009), using the same set of spectral, photometric and morphological 
parameters. 

Methods. We applied two different methods of classification to a sample of galaxies extracted from the zCOSMOS redshift survey, in the redshift 
range 0.5 S z S 1.3. The first method is the combination of three independent classification schemes - a spectroscopic one based on the strength 
of the continuum break at 4000 A and the rest-frame equivalent width of [O n] emission line, a photometric one based on observed B - z colours, a 
morphological one adapted from Scarlata et al. (2009) -, while the second method exploits an entirely new approach based on statistical analyses 
like Principal Component Analysis (PCA) and Unsupervised Fuzzy Partition (UFP) clustering method. The PCA+UFP method has been applied 
also to a lower redshift sample (z 5! 0.5), exploiting the same set of data but the spectral ones, replaced by the equivalent width of Ha. 
Results. The comparison between the two methods shows fairly good agreement on the definition on the two main clusters, the early-type and the 
late-type galaxies ones. Our PCA-UFP method of classification is robust, flexible and capable of identifying the two main populations of galaxies 
as well as the intermediate population. The intermediate galaxy population shows many of the properties of the "green valley" galaxies, and 
constitutes a more coherent and homogeneous population. The fairly large redshift range of the studied sample allows us to behold the downsizing 
effect: galaxies with masses of the order of 3 • 10 10 M Q mainly are found in transition from the late type to the early type group at z > 0.5, while 
galaxies with lower masses - of the order of 10 10 M Q - are in transition at later epochs; galaxies with M < 10 10 M Q did not begin their transition 
yet, while galaxies with very large masses (M > 5 • 10 10 M G ) mostly completed their transition before z ~ 1. 

Key words, galaxies: general - galaxies: evolution - galaxies: fundamental parameters 



1. Introduction 

It is well known that galaxies show a large assortment of ob- 
servational and intrinsic features. In the local and near universe, 
(up to z ~ 1, Bell et al. 20041 many of these properties, such 
as optical colours ( |Strateva et aL|2001| |Ball et al.||2006[ ), mor- 
phological parameters ([Driver et al.|2006|l, and spectral indices 
( Kauffmann et al.|20 03; Balo gh et al.|2004] l, are known to come 
in a bimodal fashion. The origin of these bimodalities is not clear 
yet, in terms of galaxy evolution (Blanton et al. 2003). The ex- 
istence of two different groups has been explained in the past as 
a matter of different initial conditions (galaxies having different 
mechanisms of formation), whether it would be a dissipation- 
less collapse, leading to the formation of an elliptical galaxy and 
the dispersion of its gas content, or a dissipative one, giving as 
a result a spiral galaxy which retained its gas and could subse- 
quently maintain its star formation (El lis et al.|20 05). The most 
accepted current cosmological models, however, predict that the 
formation of galaxies is mostly hierarchical, massive ellipticals 



being the result of a series of major mergers between smaller 
spiral galaxies ( |Cole et aT7|| 1 994 Baugh et al.| 1996 Schweizer 
2000, for a review). For these reasons the widely accepted sce- 
nario to explain the bimodal segregation of the galaxy properties 
is an evolutive one: galaxies in different phases of their evolution 
show different colours, different star formation rates, different 
morphologies. How these different parameters are connected is 
still a matter of debate ( |Conselice|2006| l; it appears clear, how- 
ever, that a better knowledge of these connections would help 
develop a deeper understanding of the physical processes behind 
galaxy evolution. 



The purpose of this work is to develop a robust and power- 
ful method to classify galaxies from large surveys, in order to 
establish and confirm the connections between the principal ob- 
servational parameters of the galaxies (spectral features, colours, 
morphological indices), and help unveil the evolutions of these 
parameters from z ~ 1 to the local Universe. This paper makes 
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use of zCOSMOS and COSMOS surveys data, and capitalizes 
their large capabilities in terms of data reliability and vastness. 

The paper is organized as follows: in ^2] we will briefly de- 
scribe the zCOSMOS survey and the sub-samples of the data 
used in this paper; in 5j3]we will present the extension to the 10k 
zCOSMOS-bright sample of the classification cube method pre- 
sented by Mignoli et al. (2009 hereafter M09) as applied to a 
smaller sample; in §|4]we will present a new method of clas- 
sification, based on statistical tools like Principal Component 
Analysis and Cluster Analysis; in ^5] we will discuss and com- 
ment results of the two combined methods, and present a 
quick review of some interesting sub-populations; in §|6]we will 
present final remarks and the general picture emerging from this 
work. 

Throughout this paper, unless otherwise stated, we assume a 
concordance cosmology with Dm = 0.25, Qa = 0.75 and Ho 
70 km s _1 Mpc~'; magnitudes are expressed in the AB system. 



2. Description of zCOSMOS 



zCOSMOS ( |Lilly et al.||2007l [2009] ) is a large redshift sur- 



vey which has been carried out using VIMOS spectrograph ( |Le| 



Fevre et al.|[2005) installed at the 8 m UT3 "Melipal" of the 
European Southern Observatory's Very Large Telescope at Cerro 
Paranal. The main goal of the survey is to trace the large scale 
structure of the universe up to z ~ 3 and to characterize galaxy 
groups and clusters. 

In order to exploit more efficiently the resources of the 
VIMOS spectrograph, the zCOSMOS survey has been split in 
two distinct parts: 

- zCOSMOS-bright, a magnitude-limited (Tab < 22.5) survey 
that, once completed, will consist of ~ 20000 galaxies in 
a redshift range of 0.1 < z < 1.2. This part of the survey 
is being undertaken on the 1.7 deg 2 COSMOS field fully 
covered by the ACS camera of the Hubble Space Telescope 
( |Koekemoer et al.|2007| l; 

- zCOSMOS-deep, a survey whose ~ 10000 galaxies are se- 
lected through various colour criteria, with a redshift range 
of 1.4 < z < 3.0, in the central 1 deg 2 of the COSMOS field. 

The specifications of the bright part of the survey include a 
very high success rate in redshift determination (~ 90%), a uni- 
form sampling rate across the whole field, and fairly good veloc- 
ity accuracy (~ 100 km s~') which allow to define the dynamical 
environment of the galaxies. 

The data release this paper is based upon, called 10k sample, 
is made up of 10 642 galaxies from the zCOSMOS-bright part of 
the survey, regardless of the spectral quality. Our first work sam- 
ple is composed by 4 874 galaxies between 0.48 < z < 1.28: this 
will be referred to as high redshift whole sample. This choice is 
due to the fact that, given the spectral range of the observations 
(5550-9650 A), the spectral features around rest-frame 4000 A 
that we use in this work (the continuum break at ~ 4000 A - 
from now on D4000 - and the [On] emission line) can be de- 
tected only in that redshift range. The high redshift high quality 
sample, instead, is composed by all the galaxies with spectro- 
scopic flag 4, 3 and 2.5, i.e. galaxies with secure redshifts, or 
likely redshifts confirmed by the photometric one (for a more de- 
tailed review of spectral confidence flags, see Lilly et aLp 009). 
Galaxies with spectroscopic flag=l are excluded because of their 
poorly-defined spectral features, while flag=9 are excluded be- 
cause of the absence of other spectral features beside a single 
strong emission line; this high quality subset is composed by 



3 720 objects (76% of the whole sample). The subsequent ex- 
tension of the work to lower redshifts, achieved by substituting 
D4000 and EWotOn] with the rest-frame equivalent width of 
Ha (EWo(Ha)), builds up a different dataset composed by 3 402 
galaxies (low redshift whole sample); the corresponding low red- 
shift high quality sample is made up by 3 005 galaxies (88% of 
the whole sample). It has to be noted that, throughout the analy- 
sis, the informations on the errors associated with the parameters 
were not included, since many parameters (like the morpholog- 
ical ones) were not given an error. Furthermore, spectroscopical 
stars and broad-line active galactic nuclei have been excluded 
from both samples. 

3. The classification cube method 

We extended the classification method developed by M09, ap- 
plied to the first release of the zCOSMOS-bright catalogue (the 
so-called Ik sample, composed by ~ 1 000 galaxies) to the larger 
dataset provided by the 10k sample. This classification is based 
on three independent datasets (spectroscopic, photometric, mor- 
phological) which exploit the bimodality shown by galaxies in 
many features. 

3.1. Spectral classification 

Spectral measurements of the 10k sample were carried out by 
the automatic computer code PlateFit (Lamareille et al. 2006). 
The program analyses the galaxy spectra and performs measure- 
ments of equivalent width and flux for the most important spec- 
tral features. 

We classified galaxies in the sample using the diagram 
D4000 vs. rest-frame equivalent width of [On] (from now on 
£Wq[OII]) developed by |Cimatti et al. ( 2002[> and extensively 



used in many works, e.g. |Kauffmann et al. (|2004); Mignoli et al. 
( |2003] l; |Franzetti et aT] ( |2007| ). ZMOOOls a tracer of cumula- 
tive star formation: galaxies with stronger 4000 A breaks had 



a longer history of forming stars (|Bruzual|1983 ; Marcillac et al. 



2006 ); on the other hand, the presence of [O n] in emission is 
an effective signature of ongoing star formation ( |Kewley et al.| 
|2004 Kennicutt|1998 1. Upper limits to the observed equivalent 
widths of [On] emission lines have been computed using the 
empirical relation proposed by MignolFet al.| d2005), and com- 



pared to the values of the upper limits produced by PlateFit. 
The empirical envelope relation, which replaces PlateFit up- 
per limits when those are lower, is: 



EW lim = 



SLA 

S/N cont 



(1) 



where SL = 3 is the significance level of each line, A is 
the spectrum resolution (in A) and S /N mnt is the signal-to-noise 
ratio of the spectrum calculated in the proximity of the line. 

In Fig. [T] the D400Q-EW Q [O n] plane is shown. The hori- 
zontal dashed line represents the cut at 5 A in £Wo[On] used 
to separate strong and weak line emitters, adopted by M09. 
We used an iterative cr-clipping least squares algorithm to 
constrain the regions of highest density obtaining the following 
boundaries: 



1.64 < D4000 + 0.361og(£W [On]) < 2.14 



(2) 



This is somewhat narrower with respect to Eq. (2) in M09, 
especially toward the left side of the diagram - low D4000 val- 
ues - due to a lower cr rejection in the algorithm. 
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Fig. 1. Spectral classification diagram for the 10k high quality 
zCOSMOS sample. In red are passive galaxies, in blue star 
forming galaxies and in green red emitters. Small arrows mark 
objects for which we have only upper limits in £ Wo [Oil]. 
Numbers represent the fraction of objects belonging to each 
class. 



We defined star-forming galaxies the 66% of the spectro- 
scopic high quality galaxies with low values of D4000 and high 
values of EWo[On], and quiescent galaxies (21%) those with 
low values of EWotOn] and high values of D4000. Galaxies 
populating the upper-right part of the diagram, which are the 
8.5% of the total, are defined as the population of intermediate 
galaxies with a quiescent-like continuum but with strong emis- 
sion lines, and are mainly associated with AGNs. 

The left part of the diagram is mainly populated by low qual- 
ity spectra objects; high quality objects in this region (which 
are 4% of the total high quality sample) reside mostly near the 
boundary. 

Considering the high quality sample only, nearly 88% of the 
galaxies are classified in one of the two main classes. Relaxing 
the constraints on the requested confidence on the spectral 
features, the fraction of galaxies in each area of the D4000- 
EWq[0 ii] plane is mostly unchanged. 

3.2. Photometric classification 

We introduce another classification based on the photometric 
properties of the galaxies. In the lower panel of Fig. |2]the colour 
B - z of the galaxies ( |Capak et al.|2007] l is shown as a function 
of their redshift. We used B — z colour because of its effective- 
ness in separating the two galaxy classes in the redshift range 
explored by the zCOSMOS bright sample (M09). Spectroscopic 
star-forming galaxies (blue triangles) have lower B - z and thus 
are bluer than both quiescent and intermediate galaxies (respec- 
tively red squares and magenta dots). As a way of discriminating 
the two populations, we used the colour track of a Sab galaxy 
template, from the set provided by Coleman, Wu, & Weedman 
( 1980) (see discussion in M09). 



Fig. 2. Photometric classification of the 10k zCOSMOS-bright 
high quality sample. In the lower panel colour B - z versus red- 
shift z is shown: blue triangles are star-forming, red squares are 
quiescent, green dots are red emitting galaxies. Solid line rep- 
resents the evolutionary B — z track of a template Sab galaxy 
from |Coleman et al.| ( |l980) ( |Sawicki efaL|l997) . In the upper 
panel the distributions of A(B - z), as defined in Eq. for star- 
forming galaxies (blue histogram), quiescent galaxies (red his- 
togram) and red emitting galaxies (green histogram) are plotted. 
The dashed line represents A(B - z) of the Sab galaxy evolution- 
ary track used as separator. 



B-z 


Quiescent 


Star-forming 


Total 


Red 


983 (1167) 


227 (318) 


1210(1485) 


Blue 


208 (320) 


2431 (3081) 


2639 (3401) 


Total 


1191 (1487) 


2658 (3399) 


3849 (4886) 



Table 1. Summary of the number of high spectral quality galax- 
ies in spectroscopic and photometric classifications. Between 
parentheses are figures from the whole sample. 



Galaxies classified as intermediate on the basis of their spec- 
tral properties are distributed in the same region as the quiescent 
ones; this can be seen in the upper panel, where is plotted the 
distribution of the distances between measured colours and the 
colour of the template at the redshift of the galaxy: 



A(B - Z) = (B - Z) 0bs - (B - Z),empl 



(3) 



We use the quantity A(B - z) to segregate photometrically 
the galaxies: if A(B - z) > galaxies are considered "red", 
while when A(B - z) < galaxies are put in the "blue" class. 
Since, as we said, intermediate galaxies seem to share colours 
with the quiescent galaxies, we decided to merge these spectro- 
scopic classes into one general "quiescent" category. 

In Table [T] the 2x2 contingency table for spectral and photo- 
metric classifications is shown: almost 90% of the high quality 
sample shows a full agreement between the spectral and photo- 
metric classifications (87% for the whole sample). The Cohen's 
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kappa coefficient for inter-rater agreement is 0.74, confirming 
that the classifications are statistically consistent. 

3.3. Morphological classification 



Morphology data are provided by Scarlata et al. (|2007i), who 
built their Zurich Estimator of Structural Types (ZEST) perform- 
ing a Principal Component Analysis (PCA) on 5 parameters de- 
rived directly from HST /ACS images of the COSMOS survey 
HKoekemoer et al.|2007) . 

The ZEST classification scheme adopts a main morpholog- 
ical index, which is 1 (for elliptical galaxies), 2 (for spirals) or 
3 (for irregulars), plus an integrative bulgeness parameter (only 
for galaxies with main index of 2), calculated from galaxy Sersic 
indexes. In this way spiral galaxies are further divided into four 
subclasses: 2.0, 2.1, 2.2, 2.3 going from bulge dominated spi- 
rals to disk dominated, largely following Hubble classification 
of spiral galaxies from SO through Sc types. 

We assigned ZEST type 2.2, 2.3 and 3 galaxies to a com- 
mon morphological category, the disk-dominated and irregular 
galaxies, and ZEST types 1 and 2.0 to another common category, 
the ellipsoidal galaxies. ZEST types 2.1 (spiral galaxies with an 
intermediate bulge-to-disk ratio) are furtherly divided according 
to their colour properties: indeed, most (83%, 360/436) spectro- 
scopic star-forming galaxies of ZEST type 2.1 have a negative 
A(B - z), and are therefore classified as "blue", while a similar 
percentage (82%, 287/350) of spectroscopic quiescent galaxies 
have A{B - z) > and are classified as "red". Therefore, we 
included the "red" population of the ZEST 2.1 type in the mor- 
phologically ellipsoidal class and the "blue" population of them 
in the disk-dominated class (see discussion in M09). 

In Table [3] we present the numerical results of our morpho- 
logical classification. The Cohen's kappa coefficient is « 0.67 
for the high quality sample, proving the goodness of our classi- 
fications. 



3.4. The cube 

To better analyse the correlations and similarities of our galaxies, 
we merged the three classifications (spectroscopic, photometric 
and morphological) into a three-axial framework, a classification 
cube. To simplify the classification we assigned to each galaxy a 



ZEST \ spectral 


Quiescent 


Star-forming 


Total 


ellipsoidal 


607 (717) 


236 (292) 


843 (1009) 


2.1 


350(410) 


436 (528) 


786 (938) 


disk-dominated 


141 (232) 


1860 (2391) 


2001 (2623) 


Total 


1098 (1359) 


2532 (3211) 


3630 (4570) 



cube 


# high-q 


% high-q 


# all 


% all 


111 


846 


23.3% 


985 


21.4% 


222 


2171 


59.9% 


2743 


59.7% 


1 1 1 


A C 

4o 


L.j/c 


04 


1 A C7„ 
1 .470 






1 .J IC 


1A 
/ 4 


1 

1 .0 IC 


211 


168 


4.6% 


255 


5.5% 


122 


139 


3.8% 


216 


4.7% 


221 


144 


4.0% 


169 


3.7% 


112 


65 


1.8% 


94 


2.0% 


TOT 


3630 


100% 


4600 


100% 



Table 2. Summary of the number of high spectral quality galax- 
ies in spectroscopic and photometric classifications. Between 
parentheses are figures from the whole sample. 



Table 4. Complete classification cube. The column "cube" con- 
tains the 3-digit identifier for the classifications adopted in this 
paper: first, second and third digit represent respectively spec- 
tral, photometric and morphological classifications. 



3-digit numerical flag which encompasses information from the 
three categories: 

- The first digit represents the spectral classification. Flags 1 
and 2 classify a galaxy as a "quiescent" and "star-forming" 
type, respectively. 

- the second digit stands for the colour classification. Flag 1 
and 2 classify a galaxy as a "red" and "blue" type, respec- 
tively. 

- the third digit is the morphological flag. Flags 1 and 2 clas- 
sify a galaxy as a "spheroidal" and "disk/irregular" type, re- 
spectively. 

So, for instance, a "212" classificator denotes a star-forming, 
disk-dominated galaxy with A(B - z) > 0, therefore red. 

Table |4] shows the summary of the 3D classification cube. 
Removing from the high redshift whole sample objects for which 
the full set of data was not available, the full sample of the cube 
retains 4 600 sources, while the high quality sub-sample is made 
up of 80% of them (3 630). Figures change very little between 
the two samples: almost 60% of the sources show a fully con- 
cordant "222" classification (star-forming spectra, blue colours, 
disk-dominated morphologies) and more than 20% of the sample 
is composed by "111" galaxies (quiescent spectra, red colours, 
spheroidal morphologies). On the whole, 83% of the galaxies 
show a fully concordant cube classification, very similar to the 
85% of concordance shown by the smaller zCOSMOS-bright lk 
sample (see M09). 

This agreement confirms the goodness of this kind of classi- 
fication: the vast majority of the galaxies in the sample belong 
to one of the two larger classes that show concordant behaviour 
in spectral, photometric and morphological properties. In these 
three fundamental observational features, bimodality is a major 
property of the galaxy population, both considering these fea- 
tures one at a time and comparing them in a more organic way. 



morph +B - z 


Quiescent 


Star-forming 


Total 


Spheroidal 


894 (1049) 


312(394) 


1206 (1443) 


Disk/Irregular 


204 (310) 


2220 (2817) 


2424 (3127) 


Total 


1098 (1359) 


2532 (3211) 


3630 (4570) 



Table 3. Spectral-morphological contingency table. Figures are 
for the high quality sample; between parentheses are figures for 
the full sample. 



4. PCA-Clustering classification method 

The bimodality is an intrinsic property of galaxies, not only con- 
sidering single specific characteristics like colours, spectral in- 
dices, morphologies etc, but also taking those properties as a 
whole, as we have seen in the previous section. A classification 
cube stands on its own because of this global bimodality, which 
tells us that galaxies are well divided in two categories, "early 
types" and "late types". How these two categories relate to each 
other is still matter of debate, and the characterisation of transi- 
tional galaxies - objects that represent the bridge from one cate- 
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Fig. 3. 2D density maps of the high redshift galaxies in PC1-PC2 
plane (upper panels) and in PC1-PC3 plane (lower panels). Left 
maps are derived from the whole sample, while right ones are 
derived from the high quality sample only. It is clearly visible 
the global bimodality of galaxy properties, represented by the 
two "clumps" in density. 



gory to another, the so-called green valley - is of paramount im- 
portance for the definition of the evolutive history of the galaxies 
and to understand how and why galaxies migrate between cate- 
gories. 

For these reasons we decided to pursue a more global look 
to our sample, considering properties of galaxies as a whole. To 
accomplish this task, we used the Principal Component Analysis 
on our sample and a Cluster Analysis to identify the loci of early 
type and late type galaxies. 



Fig. 4. Biplot of our PC1-PC2 plane. Black points are the galax- 
ies as expressed in terms of PCs, while blue arrows represent the 
"direction" in which each original variable tends to scatter the 
data. 



The first step required to apply the PCA to a data set is 
to normalise the involved observables. Thus, we took the log- 
arithm of ZsWofOn], as this variable is distributed as a log- 
normal distribution. Therefore, from now on we will be referring 
to log(£'Wo[0 ii]) every time we mention the equivalent width 
of [On]. 

The result of the PCA application to our eight variables is a 
rotated eight-dimensional space, where every new variable (PCx, 
where x e N, x < 8) is a linear combination of the original ones: 



PCx=Y A a{i) x V i 



(4) 



4.1. Principal Component Analysis 



The Principal Component Analysis (PCA) (Pearson 1901 



Hotelling 1933 ) is an orthogonal linear transformation useful to 



reduce multidimensional data sets to lower dimensions, in order 
to facilitate subsequent analysis. It transforms the data to a new 
coordinate system such that the greatest variance by any projec- 
tion of the data comes to lie on the first coordinate (called the 
first principal component), the second greatest variance on the 
second coordinate, and so on. For this reason PCA is the ideal 
tool to study a large number of parameters, allowing us to under- 
stand their importance and correlations. 

Our PCA run involved 8 major observational properties of 
the sample: two parameters are derived from spectra (the D4000 
break and the EWotOn]); one is derived from the photomet- 
ric analysis (A(B - z)) and the remaining parameters are mor- 
phological: M20 (second-order moment of the brightest 20% of 
galaxy flux), concentration C (ratio between radii including 80% 
and 20% of galaxy light), Gini coefficient G (uniformity of light 
distribution), asymmetry A (rotational symmetry of light distri- 
bution) and dumpiness S , as taken from ZEST catalogue. We 
chose these parameters in order to keep our results comparable 
to the previous classification, the 3D cube, which makes use of 
the same observables. 



where — 1 < a(i) x < 1 are the coefficients of the linear trans- 
formation and V, are the original variables. 

In Table [5] the coefficients a(i) x of our PCA are shown. 
Coefficients show the relative importance of the original vari- 
ables in each eigenvector PCx: the larger the value of a(i) x , the 
stronger the importance of the associated variable within the 
principal component. The two last rows of PCA table show the 
proportional variance (how much variance is expressed by each 
single PC) and the cumulative variance (how much variance is 
explained by the sum of the previous PCs). We decided to never 
let the cumulative variance be below 80% of the original total 
one, so we decided to keep the three first PCs, which explain 
84% of the original variance. 

Fig. [3] shows the density of the data points in the PC1-PC2 
and PC1-PC3 planes, obtained via kernel density estimation with 
an axis-aligned bivariate normal kernel, evaluated on a square 
grid ( |Venables & Ripley|2002[ ). The plot shows the isodenses of 
the points, both using lines of equal density and a colour-coded 
2D map: the global bimodal nature of the whole population of 
galaxies is reflected by the two "clumps" in density, separated by 
a narrow under-dense "valley", in which transitional objects lie. 
The global bimodality is much more evident in the high qual- 
ity sample, due to better measurements of the spectral features 
involved. 
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Parameter 




rLz 


1 1 /■ ■ j 

rLj 


rC4 


rC5 


FLO 


rt, / 




L)4000 


-0.368 


0.117 


0.423 


0.062 


-0.653 


0.329 


-0.365 


-0.026 


t Wq\u nj 


0.359 


-0.056 


-0.429 


-0.245 


-0.733 


-0.177 


0.233 


-0.025 


A(S - z) 


-0.392 


0.139 


0.388 


0.023 


-0.114 


-0.525 


0.621 


0.039 


G 


-0.367 


0.304 


-0.415 


0.031 


0.002 


-0.571 


-0.522 


-0.038 


M 20 


0.419 


-0.013 


0.323 


0.131 


-0.058 


-0.314 


-0.261 


0.730 


C 


0.400 


0.125 


-0.289 


-0.447 


0.065 


0.320 


0.160 


0.640 


A 


0.185 


0.772 


-0.160 


0.488 


-0.028 


0.234 


0.215 


0.066 


S 


0.278 


0.510 


0.318 


-0.693 


0.124 


-0.052 


-0.119 


-0.222 



Prop. Variance 0.586 0.142 0.109 0.063 0.043 0.024 0.022 0.011 
Cum. Variance 0.586 0.728 0.838 0.901 0.944 0.968 0.990 1.000 



Table 5. Results of the Principal Component Analysis applied to eight different properties of the galaxies. Absolute values of the 
coefficients show the relative importance of the original variables within each Principal Component; a negative coefficient means an 
anti-correlation. 



It is interesting to notice that Disney et al. (2008) stated that 
only one parameter should be sufficient to describe the nature of 
a galaxy, although they were not able to identify it: our PCA 
shows that the bimodality unfolds itself in the PCI direction 
alone. Although PCI cannot be that single simple parameter, it 
is a very interesting fact that the main properties of a galaxy can 
be described just by looking to its PCI value. 

The so-called biplot is a very useful tool to understand the re- 
lationships between the original variables and the PCs (Gabriel 
|1971| l, and in our work it can help explain why do galaxies ar- 
range themselves in this way in the PC space. In the biplot in 
Fig. |4]the arrows represent the axes where each original variable 
lies, and their length is an index of their "strength", their impor- 
tance within each PC - in mathematical terms the coefficients 
a(i) x shown in Tab. [5] also called loadings. Looking at the co- 
efficients of D4000, EW [Ou], A(B - z), G, M 20 and C within 
PCI, for instance, one can see that they are roughly the same 
(in absolute value): this explains why in the biplot the relative 
arrows have more or less the same length along PCI axis. 

Fig.[4]shows that D4000 and A(fi-z) are strongly correlated, 
because the arrows point in the same direction and have similar 
strength. The EWotOn] is anti-correlated to both of them, and 
this is somewhat expected given the spectral classification shown 
in Fig.[T[ most galaxies with high values of D4000 have little or 
no emission lines, and vice-versa. A(fi-z) increases with D4000, 
so basically redder galaxies have a larger D4000, and this is also 
expected from Fig. [2] We note also that C and G are strongly cor- 
related: G is a measure of how uniformly the flux is distributed 
among pixels in the galaxy image, so more concentrated galaxies 
have a larger value of G. M20 is anti-correlated with the two other 
morphological parameters: since M20 is a measure of how many 
bright off-centred knots of light are present, the greater is the 
value of M20, the "later" is the galaxy, because disk-dominated 
galaxies have more bright spots (star formation regions, spiral 
arms, bars) than spheroidal or elliptical galaxies. 

Taking into consideration only PC2 we can see that asymme- 
try A and dumpiness S are very strongly correlated: the larger 
the value of PC2 of a galaxy, the more disturbed its morphology 
is. Objects with low values of PC2 show more regular morpholo- 
gies, and are separated by their values of the other morphological 
parameters like C, M20 and G. 

4.2. Cluster analysis 

Cluster analysis is based on partitioning a collection of data 
points into a number of subgroups, where the objects inside a 
cluster show a certain degree of closeness or similarity. Hard 



clustering assigns each data point (feature vector) to one and 
only one of the clusters, with a degree of membership equal 
to one, assuming well defined boundaries between the clusters. 
This model often does not reflect the description of real data, 
where boundaries between subgroups might be fuzzy, and where 
a more nuanced description of object's affinity to the specific 
cluster is required. For this reason we applied a fuzzy clustering 
method to our PCA-reduced sample in order to segregate galax- 
ies between the two clusters. 

Our method makes use of the Unsupervised Fuzzy Partition 



(UFP) clustering algorithm as introduced and developed by Gath 
& Geva ( 1989 ). The approach of this method is Bayesian: first it 
is required to run a partition algorithm to provide first guesses of 
memberships and cluster centroids. This is achieved via a mod- 
ification of the fuzzy K-means algorithm (Bezd ek|1973| l. These 
prototypes are then used by the second algorithm (Fuzzy modi- 
fication of maximum likelihood estimation - FMLE) to achieve 
optimal fuzzy partition ( |Geva et al.|2000) >. 

Fig. [5] shows 2D projections of the application of the UFP 
clustering algorithm to our 3D dataset. The global bimodality 
shown by the PCA application is confirmed and well defined 
by the UFP algorithm. As already noticed in 44.1 the leftmost 
objects (in red in the plot) are the early type galaxies, while in the 
rightmost part of the diagram (in blue) are the late type galaxies. 
Figs. [6] are 3D visualization of the data, trying to show the PC- 
spatial distribution of the different galaxy populations. 

Being a fuzzy partitioning method, objects do not belong just 
to one cluster: for any given data point, its probability of mem- 
bership is spread across all the clusters, provided that the sum of 
memberships for all clusters is equal to 1 . 

In our work we assign objects to a cluster only if their prob- 
ability of membership for one of the clusters is P > 0.9. We 
chose this threshold because, due to the exponential nature of 
the FMLE distance function, there is a steep rise in the probabil- 
ity function until P ~ 0.9, and then there is a general flattening 
for P > 0.9. In Fig.[5]red objects are galaxies which belong to the 
"early type cluster" with a probability of more than 90%, while 
blue objects are galaxies which belong to the "late type cluster" 
with the same probability threshold. All other galaxies (those 
which belong to a cluster with a probability 0.5 < P < 0.9) are 
marked in green. 

Early type galaxies, defined in this way, represent almost 
30% of the entire sample (1413 objects), while late types are 
62% (3035) and the other 8% (426) are classified as intermediate 
objects. The early types' locus here is more populated than the 
correspondent class in the classification cube (the "111" class), 
which was composed by 23% of the total sample (TableQ. This 
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Fig. 5. Result of the Unsupervised Fuzzy Partition (UFP) clus- 
tering algorithm applied to the PCA-reduced whole sample: the 
upper panel represent the PC1-PC2 plane, while the lower panel 
represent the PC1-PC3 plane. In red are early type galaxies, in 
blue late type galaxies, in green our intermediate objects. Brown 
lines are the interceptions on both planes of the isoprobability 
surfaces with probabilities 70% and 90%. Black curves are the 
isodenses of the points in the planes, computed via gaussian ker- 
nel smoothing. 



is due to several reasons: the 90% membership threshold for 
the UFP cluster analysis, which seemed a fair choice due to the 
shape of the probability function, is however more or less arbi- 
trary; choosing a 95% membership threshold, for instance, low- 
ers the percentage of early type objects to ~ 20%. Moreover, the 
classification cube considers 8 different classes of objects, while 
PCA+UFP only 3 of them: many of the outliers in the classifi- 
cation cube (all the 121s and the 211s, and a great part of 1 12s 
and 221s) are now classified as early types in PCA+UFP. If they 
were to be classified as fully concordant Ills in classification 
cube, this class would be made up of ~ 31% of the whole sam- 
ple. Finally, one must keep in mind that the "early type cluster", 
as defined by PCA+UFP, is not intended to be made up of pure 
passive galaxies; rather, it is composed also by bulge-dominated 
weakly-starforming objects. 

Most of the differences between the two methods can be 
ascribed to errors and misclassifications due to the "hard par- 
titioning" logic of the old cube classification: each of the 
sub-classifications of the cube were characterized by clear-cut 
boundaries that can produce placement misclassifcations, es- 
pecially for objects that are in proximity of those boundaries. 



Fig. 6. Two different three-dimensional visualizations of the PC 
space. The colours represent the clusters as defined by the UFP 
cluster analysis in Fig. [5] Different intensities of the colours rep- 
resent the distance of the point from the vantage point, trying to 
give the idea of the depth of the points distribution. 



Another culprit could be the high number of morphological pa- 
rameters in the PCA+UFP analysis, that might assign greater im- 
portance to those to the detriment of other parameters; however, 
several runs of the PCA+UFP algorithms with lower numbers of 
morphological parameters do not seem to substantially change 
the results. 

Fig. [5] shows also the local density evaluation as shown in 
Fig. [3] It can be easily seen that the intermediate objects lie in 
the "valley" between the two major clumps of data points. This 
is something expected, since we wanted to point out the relative 
difference between these objects and the galaxies belonging to 
the two clusters. 



4.3. Extension to low redshifts 

Due to the parameter choice of this analysis, we were forced to 
limit the analysis to a sub-sample of the 10k zCOSMOS sample: 
as we said in ^2 the spectral features involved in the analysis 
(D4000 and EW[Ou]) are detectable within zCOSMOS-bright 
only at 0.48 < z < 1.28. The higher limit in redshift coincides 
with the limit of the zCOSMOS-bright survey, but the nearest 
galaxies (between < z < 0.48) were left out of the analysis. 
In order to expand the analysis and to follow the behaviour of 
galaxies in the entire redshift range of zCOSMOS-bright survey, 
we decided to exploit the PCA+UFP method to probe the galax- 



8 



G. Coppa et al.: A new galaxy classification based on the global optical properties of zCOSMOS 10k sample. 




Fig. 7. Biplot of PC1-PC2 plane for low redshift galaxies. 



ies even at lower redshifts, substituting the spectral features used 
at high redshifts with one of the best star formation indicators - 
Ha - which is detectable within zCOSMOS-bright from the lo- 
cal universe to z ~ 0.48. This is one of the main reasons behind 
this work: the PCA+UFP method, not being tied to a particular 
set of data, is able to use different parameters and probe different 
redshift ranges and properties of the galaxies. 

For the extension at low redshifts we therefore considered 
7 observable parameters: A(B - z), M20, concentration C, Gini 
coefficient G, asymmetry A, dumpiness S and EWo(Ha). Like 
in the previous analysis with EWo[0 11] we considered the loga- 
rithm of the equivalent width due to its log-normal distribution, 
so from now on EWo(Ha) has to be intended as log EWo(Ha). 
The low redshift sample defined in this way is composed by 
3402 galaxies. Results of the application of the PCA are shown 
in Table [6] As for the analysis at high redshifts, we decided to 
consider those PCs that give a cumulative variance not less than 
80%. In this case we took into account the first 4 PCs, which 
account for 89% of the total original variance. 

In Fig. [7] the biplot of the PCA for low redshift galaxies is 
shown. By comparing it with Fig. |4]one can see the striking re- 
semblance in the cloud's shape and in loadings' directions. The 
function of D4000 and /sWotOn] - to segregate the galaxies 
mainly in PCI direction - is taken over by EWo(Ha), while the 
other parameters' relations remain largely unchanged. With re- 
spect to Fig. |4j galaxies in the early-type cluster spread more in 
PC2 (which is mainly morphology driven): this is probably due 
to ACS being progressively abler to recognise features, even in 
spheroidal galaxies, with decreasing redshift, due to the larger 
size of the galaxies themselves. So spheroidal galaxies with 
streams due to encounters with companions, interacting galax- 
ies or just objects with companions nearby, have larger values 
of asymmetry A and dumpiness S with respect to galaxies with 
similar features but at higher redshifts (angular dimensions of 
those galaxies will be smaller and their features will most likely 
be too small and faint to be appreciated with an automatic anal- 
ysis). This is evident in Fig. [8}, where ACS snapshots of the 
galaxies in early types' cluster with highes values of the second 
principal component (PC2 > 2) are shown. 



Fig. 8. Composite ACS image (see Koekemoer et al. 2007 1 of 
low redshift early type galaxies with highest values of PC2. 
Their morphologies are quite complex, suggesting tidal interac- 
tions and recent merging. 

Fig. [9] shows the result of the UFP clustering algorithm ap- 
plication to the low redshift sample of galaxies. As in previous 
analysis for the high redshift sample, we used a threshold of 
90% membership to distinguish between objects belonging to 
the "early-type" cluster, to the "late-type" one or objects not be- 
longing to any cluster - our "green valley" galaxies. Green val- 
ley objects lie in the saddle between the two main clusters, as 
it can be seen in the plot represented by isodenses, calculated 
by gaussian square kernel smoothing of the PC1-PC2 and PC1- 
PC3 planes, in a way similar to that of the high redshift galaxies 
(Fig. |5J. With respect to high redshift galaxies, clusters of low 
redshift galaxies appear less centred and defined: green dots, for 
instance, appear well beyond the boundaries of 90% isoproba- 
bility that define them. This is due to the isoprobability curves 
being merely 2D projections of 4D hypersurfaces, since, as we 
said, we considered the first 4 PCs for the cluster analysis. 

Out of the 3402 objects the low redshift sample is made up 
of, early type galaxies represent 20.6% (704 objects), while late 
type galaxies are 70.5% (2401), and the green valley galaxies 
are 8.9% (297). With respect to the high redshift sample, green 
valley objects represent more or less the same percentage of ob- 
jects, while there is significant shift of populations between the 
two main clusters: late type galaxies are ~ 10% more with re- 
spect to the high redshift sample, while conversely early types 
are 10% less. This is likely to be due to a selection effect (at 
low redshift we are sampling galaxies with lower luminosities 
and lower masses, which are on average "later" at all redshifts), 
rather than a real evolutive feature. In the next section we will 
explore in more details the evolution of the galaxy populations 
with redshift. 



5. Results 

The PCA+UFP analysis presented in this work offers many im- 
provements with respect to the previous methods of classifica- 
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Parameter 


PCI 


PC 2 


PC3 


PC4 


PC 5 


PC6 


PC7 


EW (}ia) 


0.340 


-0.097 


-0.545 


-0.541 


-0.529 


0.032 


-0.042 


iA\ D Z ) 




U.ZUU 


U.J 1 1 


n i 


-\). / l)U 




U.UOJ 


G 


-0.439 


0.024 


-0.463 


-0.024 


0.249 


0.717 


0.119 


M2Q 


n son 

U. JUU 


u.uou 


n 1 dA 

u. lUH- 


U.ZZU 


u.uou 


U.H- / 1 


-U.O / o 


c 


0.216 


0.634 


0.358 


-0.520 


0.178 


0.217 


0.269 


A 


-0.471 


0.167 


-0.045 


-0.427 


0.186 


-0.293 


-0.666 


S 


0.086 


0.698 


-0.499 


0.423 


-0.007 


-0.274 


-0.004 


Prop. Variance 
Cum. Varariance 


0.483 
0.483 


0.177 
0.660 


0.126 
0.786 


0.104 
0.891 


0.060 
0.950 


0.035 
0.986 


0.014 
1.000 



Table 6. Results of the Principal Component Analysis applied to the low redshift (z < 0.48) galaxies. 




Fig. 9. Cluster analysis results for low redshift galaxies. 
Superimposed to the points, as in Fig. [5] are the isodenses of 
the points calculated via kernel smoothing in PC1-PC2 and PC1- 
PC3 planes. The curved lines represent the projected isoproba- 
bility curves. Clusters and green valley objects appear more scat- 
tered across the planes because of projection issues from four- 
dimensional PCA to the 2 dimensions of the plot. 



tion like the classification cube. One of the greatest advantages 
of such an approach is given by its self-consistency and its global 
approach to the parameters: as we stated in ^4.2| the classification 
cube is prone to errors in one or more of its sub-classification 
methods because they are "hard partition" ones. Given the fact 
that every parameter is treated separately from the others, it is 
easier to have one of them misclassified due to internal errors or 
closeness of the value to the boundaries. 



The PCA+UFP method reduced the possibility of this kind 
of errors because its parameters are treated simultaneously: us- 
ing the PCA on a multidimensional space we are "averaging out" 
outlying values in a small number of parameters. This can be 
intuitively understood by looking at biplots (Figs. [4]and[7]i: an 
outlying value in M20, for instance, can be compensated by "nor- 
mal" values in spectral emission lines, D4000 and C. 

Another powerful feature of the PCA+UFP analysis is its 
flexibility: while the classification cube is strongly bound to its 
defining parameters - and for this reason has been applied to 
the high redshift sample in this work - the PCA+UFP analysis 
is not restricted to a particular dataset or a particular set of pa- 
rameters. We therefore can extend the work to low redshifts just 
by substituting the two spectral parameters with a different one. 
The choice of Ho- has been made in order to keep the possibil- 
ity to compare the results of high and low redshift samples, and 
have a comprehensive look to the whole 10k dataset. Actually, 
the PCA+UFP method can successfully be applied also to com- 
pletely different datasets (star formation rates, masses, luminosi- 
ties) of this or other galaxy surveys, and that is possibly its most 
important achievement. 

In the next subsections we will show some of the properties 
of the whole 10k population, and of few interesting subsamples, 
in PCA+UFP analysis. 



5.1. Combined high and low redshift sample 

Fig.[l0]shows the evolution of the different populations of galax- 
ies, within the whole 10k sample, with redshift and with mass. 

~ d2009l, us- 



Masses have been computed by Bolzonella et al. 
ing |Bruzual & CharloT| ( |2003[ ) population synthesis models, by 
means of the Hyperzmass code, a modified version of the photo- 
z code Hyperz ( [Bolzonella et al.|20 00). 

Low mass galaxies (logM/M Q < 9.9, first column) are al- 
most exclusively part of the late-type cluster, while high mass 
galaxies (logM/M Q > 10.7, last column) mainly belong to the 
early-type cluster. The transition can be mostly seen in the in- 
termediate mass bins: at 9.9 < logM/Mg < 10.3, galaxies at 
high redshift (z > 0.80) are still forming stars actively, and are 
therefore concentrated in the late-type cluster; the "migration" 
towards the early type cluster seems to begin at moderately lower 
redshifts (0.60 < z < 0.80), slowing down from z ~ 0.50 and be- 
ing still ongoing also in the local Universe. 

At slightly larger masses (10.3 < logM/M < 10.7) this 
transition appears to happen at earlier epochs: at 0.60 < z < 0.8 
early-type and late-type galaxies are numerically comparable, 
and the transition appears almost complete at 0.30 < z < 0.45. 
At very low redshifts (z < 0.30) the percentage of late-type 
galaxies seems to rise again: this is most likely due to the effect 
of asymmetry A and dumpiness S in low-redshift ACS images 
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Fig. 10. PC1-PC2 diagrams for low redshift (upper two rows) and high redshift (lower three rows) samples, kernel smoothed with 
the usual technique. Columns represent bins of mass (growing from left to right, as specified inside first row boxes), while rows 
represent bins of redshift (growing from top to bottom, as specified in first column boxes). In each panel are also shown the absolute 
numbers and fractions of galaxies in each cluster (early-type, late-type and green valley), in red, blue and green respectively. In 
some of the high redshift panels are shown the mass completenesses (as computed by |Pozzetti et a l. 2009); where there are no 
percentages the sample has to be intended as mass-complete. 



we mentioned in 54.3 This delay in the star formation quench- 
ing for the lower mass galaxies, in opposition to the larger ones, 
can be regarded as one manifestation of the downsizing effect: 
the main reasons behind this effect are still unclear, even if some 
mechanisms have been suggest ed (|Bower et al.|[2006{ [Hopkins 



et al. 2006; Dekel & Bimboim 2006). Some numerical simula- 
tions ( Schweizer 2000) show that the transition in colours should 
be very fast (of the order of ~ 500 Myr), and other observational 
studies seem to suggest that this is the case if the star formation 
is quenched efficiently; Balogh et al. (2004), however, showed 
that an exponentially decaying star formation can lengthen the 
transition phase to some Gyrs. Our work seem to suggest that a 



global transition (from our "late type" locus to the "early type" 
one) takes longer to be achieved (at least some Gyrs). Part of 
this is certainly due to the changes in colours and morphologies 
taking place with different timescales. 
Looking at Fig. 



10 by rows it is possible to appreciate the 



mass distribution of the galaxy population at fixed redshifts. At 
low redshifts the zCOSMOS survey cannot sample the high mass 
galaxies (logM/M > 10.7) due to the small sampled volume 
and the bright magnitude cut, so the corresponding boxes are 
empty. At higher redshifts mass incompleteness prevents us to 
directly compare the numbers of galaxies in each mass bin (as 
it can be seen in the plot, at z > 0.80 the mass completeness 
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Fig. 11. Evolution with redshift of the fractions of different galaxy populations in mass. Each panel shows the fraction of galaxies 
in each mass bin that belong to each PCA+UFP cluster (in cyan are late-type galaxies, in red the early-type ones, in green the green 
valley ones), in a specific redshift bin. Errors are 95% confidence intervals for multinomial populations (Miller 1966). Vertical 
dotted lines represent the 90% mass completeness in each redshift bin. The last panel represents the evolution in z of the transition 
mass (M closs ), defined as the point where red line and cyan line meet (open circles and solid line). Errors associated are given by the 
width of the region where the two strips meet. Dashed and dot-dashed lines represent the transition masses as calculated in |PozzetH| 



et al. ( |2009 1, respectively using Marseille morphologies and SED colours photometric classifications. The dotted line represents the 



transition masses as calculated using Balogh et al. (2004) definition of green valley applied to our combined sample (see 5 5.2 1. 



of the sample with logM/M Q < 9.9 is of the order of 20%). 
However, this is not a severe issue when dealing with fractions 
within each mass and redshift bin; we can assume that within 
the bin the mass distribution is rather fiat. However, due to mass 
incompleteness the highest redshift and lowest mass bins are to 
be considered with caution. 



We summarise these considerations in Fig. 1 1 where each of 

a bin of red- 



the first five panels represents a row of Fig. 10 



shift in which we divided our sample. For every given redshift 
bin the fraction of early type, late type and intermediate objects 
for each mass bin are plotted. Low mass early type galaxies are 
very few (~ 4%) in every redshift bin, late types being by far 
most frequent at log M/M Q < 9.9, as it can be seen also in the 
first column of Fig. [10| T his is in good agreement with deter- 
minations of Kova £ et al.| ( |2010T > for the same zCOSMOS sam- 
ple, who found a similar behaviour in different environments for 
galaxies of different morphological type. 

Intermediate objects seem to be numerically important 
around logM/M ~ 10.5 at high redshifts, constituting up to 



~ 20% of the sample at z ~ 0.5. This suggests that the evo- 
lutive transition from the blue cloud towards the red sequence 
may be most important at intermediate redshifts and intermedi- 
ate masses (central quadrants in Fig. 10 1. 



From Fig. 1 1 the masses at which early-type and late-type 
galaxies are numerically the same at different redshifts (M cross ), 
can also be derived. This transition mass M cross is plotted in the 



lower right panel of Fig. 1 1 as a function of redshift. Transition 



masses computed in this work (solid line in the plot) are in fair 
agreement with those calculated by Pozzett i et al.|( [2009 ) using 
Marseille morphologies ( Cassata et al.j 2007 2008 1 Tasca et al. 



2009 ) as separators of different galaxy types - dashed line in fig- 
ure - and using a photometric classification (Zuc ca et al.|20 09) - 
dot-dashed line. A Cramer- von Mises test (Anderson 1962]) con- 
firms the consistency of the three estimates of M cross (p-values 
above 0.73). It must be kept in mind, though, that determinations 
of M cross in this work are made within a three-cluster framework 
(early type, late type and intermediate galaxies), while other de- 
terminations are made taking into account only the two main 
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galaxy populations. Splitting our intermediate galaxy sample be- 
tween the other two clusters, using a 50% threshold as member- 
ship values, the evolution with redshift of M cross steepens, and 
especially at high redshifts transition masses are even more in 
agreement. Considering the different techniques of calculation, 
however, the agreement among these determinations is quite re- 
markable. 



5.2. Green valley galaxies 

Green valley galaxies have been defined in a number of different 
ways, usually exploiting their natural bimodal distribution using 
colour indicat ors like u — r (|Strateva et al.||2001 Baldry et al. 
[2004)1, U-V (|Brown et al.|2007| [Silverman et al.|2008| >, U-B 



( |Vergani et al.|2010) , B- i ( |Caputi et al.|2QQ9| >. In this subsection 
we will analyse the U-V rest-frame colour distribution (from 
now on (U - V)o) of our PCA+UFP clustered galaxies. 

The (U — V)o distribution of the combined high+low red- 
shift samples (Fig. T2J shows a clear bimodality, that reflects 
the global one we discussed throughout the paper. The separa- 
tion between the two families in colour happens at (U - V)o ~ 
1.6; the colour distribution of our late type galaxies peaks at 
(U — V)o ~ 1, while the distribution of the early types is peaked 
at (U - V)o ~ 1.9. All of these are in fair agreement with other 
determinations from literature ( Silver man et al.|2 008 1 Brammer 
et al.|2009 i. The green valley objects' distribution is peaked at 
(U - V)o ~ 1.5, near the saddle of the total distribution. 

We can compare the (U - V)q distribution of our green val- 
ley galaxies with Balogh et al. (2004i definition of green valley, 
which is defined as the 0.2 mag dip between the two observed 
Gaussian distribution for early- and late-type galaxies. Applying 
the above definition, in the combined sample 760 objects out of 
8 256 (9.2%) would be defined as "green valley" objects; this 
number is very close to the number of green valley galaxies 
in our classification (721, 8.7%); more than 25% of our green 
valley objects are so also in the Balog h~et al.| ((2004 ) definition, 
while the rest of the objects within those boundaries are almost 
equally divided by PCA+UFP between the two main clusters. 
The largest part of our intermediate galaxies lies to the left of the 
colour-defined green valley, i.e. in the region of the blue galax- 
ies, but makes up only 6.5% of all the objects in that region; 
conversely, PCA+UFP intermediate galaxies constitute 8.4% of 
all the objects in the red galaxies region. 

Being based on overall properties of the galaxies, our clas- 
sification method gives somewhat different results compared to 
classical colour definitions of green valley: the cores of the early- 
type and late-type clusters are correctly reproduced, but our clas- 
sification suggests that relying on a single colour might not be 
sufficient to correctly recover those galaxies which are really in 
transition between the late-types and the early-types clusters. 

The transition masses M cross of the sample divided using 
Balo gh et al.| definition of green valley were also calculated (dot- 
ted line in Fig. Ill; the agreement between the determinations is 
very high, even considering the uncertainties in the first redshift 
bin due to the low number of objects. Using a mass and/or red- 
shift dependent colour definition of the green valley (e.g. Brand 
et al.|20 09) results are very similar. 



5.3. Red spirals 

We checked the PCA+UFP clustering properties of some of the 
outliers in the classification cube. Obviously this has been possi- 
ble only with galaxies from the high redshift sample, because the 



Fig. 12. Rest frame U-V distributions of the galaxies in the 
combined sample (high+low redshift). Open histograms repre- 
sent the distribution of the total sample; blue, red and green his- 
tograms represent the distribution of PCA+UPF late types, early 
types and intermediate galaxies, respectively. Dashed lines rep- 
resent green valley boundaries as defined by Balogh et al. (2004) 
for comparative purposes. 



classification cube has been defined using D4000 and EWotO n], 
which were available only atz > 0.48 (see §3.1 1. Red spirals, for 



instance, are often identified with edge-on spiral galaxies, red- 
dened by a strong dust lane (Zucca et al. 2009 ; Tasca et al. 2009), 
while face-on red spirals are thought to be the very oldest spirals 
which used up their gas reservoirs, probably aided by strangu- 
lation and bar instabilities (Mast ers et al.||2009] >. In our clas- 
sification cube, red spirals may be identified by the three-digit 
codes "112" and "212", both representing morphological late- 
type galaxies (third digit "2"), the first one representing spec- 
trally passive red objects and the latter one referring to red star- 
forming galaxies. 

Galaxies with classification cube code "1 12" are 93: of those, 
24 (25.8%) are classified by PCA+UFP in the green valley 
group; 27 (29%) are in the late type cluster; 43 (46.2%) are in 
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the early type cluster. A fairly high number of them (14) possess 
unusually high values of PC2: at a visual inspection those ob- 
jects revealed very disturbed morphologies, dominated by merg- 
ing and tidal streams (Fig. |8) , in agreement with determinations 
from Conselice et al. ( 2000 ) who found that very large values of 
A (reflecting in our work in large values of PC2) are a good indi- 
cation of ongoing major merging. At least for these objects, au- 
tomatic morphological classification methods apparently fail to 
identify correctly them as merging spheroidals: their asymmetric 
characteristics are instead intepreted as late type morphologies. 

Galaxies with classification cube code "212" are 74: 25 of 
them (33.8%) are classified in the green valley group, 43 (58.1%) 
are in the late type cluster and only 6 (8. 1%) are classified in the 
early type cluster. Their range in PCI and PC2 is quite narrow, 
making those objects a rather homogeneous sample, located in 
the middle of the PC1-PC2 diagram, in or very near the low den- 
sity saddle between the clusters. Those galaxies, showing spiral 
morphologies, low star formation rates (indicated by PCI ~ 0) 
and reddish colors are the best candidates of the old spirals pop- 
ulation mentioned by Masters et al. (2009 1. 



5.4. Blue ellipticals 

In our classification cube, blue ellipticals are identified by the 
three-digit codes "121" and "221", the first one representing 
spectrally passive objects and the latter one referring to active 
star-forming galaxies, both bulge-dominated. 

Classification cube code "121" galaxies are almost ex- 
clusively assigned to the early type galaxies cluster by the 
PCA+UFP algorithm (60/64), while code "221" show a some- 
what diverse behaviour, being equally divided among the 
groups: 56 out of 169 (33.1%) belong to the green valley group, 
52 (30.8%) to the late type cluster and 61 (36.1%) to the early 
type cluster. In PCA terms, objects in the latter group are charac- 
terised by positive values of PC2 and generally negative values 
of PCI: while code "121" galaxies are most probably the re- 
sult of a color misclassification in the classification cube, and 
therefore are "normal" early type galaxies — confirmed by their 
A(B - z) very close to the dividing line in Fig. [2] — code "221" 
objects seem to be more complex. Late type "221"s have large 
values of PC2, while the PC2 value of early type "221"s is 
around 0. This may imply a misclassification in A(B - z), too, 
but it is not sufficient to explain all their features. Most proba- 
bly many of these objects, especially at higher values of PCI, 
present complex morphologies and are the result of tidal inter- 
actions. 

These results seem to imply that for these objects the spec- 
trophotometric properties are given more importance than the 
morphological ones by PCA+UFP algorithm. In fact, as we said, 
a spiral morphology classifier - especially when using wide clas- 
sifiers and automatic recognition systems - is more subject to 
errors due to the asymmetries of merging objects. 

5.5. Active Galactic Nuclei 

We also investigated the positions, in the PCA spaces, of known 
AGN in the zCOSMOS sample. Type-1 AGN, which are easily 
recognisable by their broad emission lines, are given a particular 
spectroscopic confidence class since the determination of their 
redshifts and have been excluded from the subsamples. Type-2 
AGN, on the other hand, are included in the sample since they 
are more difficult to identify, because their emission lines are 
very similar to those of regular star-forming galaxies. We used 



the diagnostic diagram selection of Bongio rno et al.| ([2009} to 
identify Seyfert 2 galaxies and LINERs and investigate their po- 
sitions in PCA planes. Two different diagnostic diagrams have 
been exploited to select type-2 AGN, at low redshift using the 
line ratio [N ii]/Ha and [O m]/HyS whereas at high redhift the line 
ratios [O m]/HB and [O n]/HB have been used. Unfortunately, the 
different ionization properties of Seyfert 2 and LINERs galax- 
ies are separable only using the diagnostic diagrams only at 
low redshifts. For this reason we will discuss the properties of 
the whole type-2 AGN population (which includes both active 
galaxy classes) in the two redshift ranges, separating the LINERs 
and Seyfert 2 galaxies only for z < 0.5 (for a more detailed anal- 
ysis see Bon giorno et al.|2009| >. 

The analysed sample is composed by 79 type-2 AGN in 
the high redshift range and 125 type-2 AGN (95 of which are 
LINERs, while the other 30 are Seyfert 2 galaxies) in the low 
redshift range. Considering both the high redshift and the low 
redshift samples, 204 galaxies are classified as Narrow Line 
AGN: 126 of them (62%) are placed by PCA+UPF algorithms 
in the late type galaxies cluster, while 47 (23%) are in the early 
types cluster and 31 (15%) are in the green valley region. If we 
restrict our analysis to the low redshift sample, 95 active galax- 
ies are classified as LINERs: 54 of them (57%) are in the late 
types cluster, 22 (23%) are in the early types one and 19 (20%) 
are in the green valley. Conversely, the 30 pure Seyfert 2 galax- 
ies are placed by our PCA+UPF algorithms as follows: 15 of 
them (50%) in the late types cluster, 1 1 (37%) in the early types 
region and only 4 (13%) in the green valley. Though we are fac- 
ing small number statistics, it is clear that the majority of the 
analysed type-2 AGN are hosted by galaxies which belong to 
the blue, late-type cluster. This is quite expected, since our ac- 
tive galaxies span the low luminosity regime, as indicated by 
the [Om]A5007 A line luminosity 10 5 5 L G < L[Om] < 10 91 L o 



(Bongiorn o et al.|2009| l. 

We also explored the fraction of the selected active nuclei 
in the various clusters as defined by PCA+UFP with respect to 
the parent population of all galaxies. While the fraction of type- 
2 AGN in each main cluster is around 2%, this class of objects 
constitutes ~ 4% of the galaxies in the PCA+UFP green valley 
region. At low redshifts, LINERs represent 2% of the objects in 
the late type cluster and 3% of galaxies in the early type one, 
but they make up 6% of the green valley galaxies. This picture 
suggests a possible enhancement of type-2 AGN in the green 
valley region. However, since numbers are small - and there- 
fore errors are large - this might not be statistically significant. 
In fact, the observerd type-2 AGN fractions in these subclasses 
are still compatible with being flat sub-samples extracted purely 
randomly from the parent sample. 

6. Summary and conclusions 



The classification cube method (Mignol f et al.| [2009) > has 
been extended and applied to the high redshift sample of the 
zCOSMOS-bright 10k release, exploiting bimodalites in spectral 
(D4000 and O n equivalent width), photometric (B - z colour) 
and morphological (ZEST classification scheme) properties of 
the galaxies. In order to overcome some of its limitations (rigid- 
ity of the scheme due to its "hard partitioning" possibility and 
nature of misclassifications, reliance on a particular set of data 
and the difficulty to adopt different variables, a certain degree of 
arbitrariety in the boundary definitions for the subclassifications) 
in this work we set up a different classification method based on 
statistical approaches like the Principal Component Analysis and 
the Unsupervised Fuzzy Partition (PCA+UFP), that exploits the 
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bimodal nature of galaxy properties in a more organic and rigor- 
ous way. 

The PCA+UFP analysis is a very powerful and robust tool 
to probe the nature and the evolution of galaxies in a survey. 
It allows to define with less uncertainties the classification of 
galaxies, adding the flexibility to be adapted to different param- 
eters: being a fuzzy classification it avoids the problems related 
to a hard classification. The PCA+UFP method can be easily 
applied to different datasets: it does not rely on the nature of the 
data and for this reason it can be successfully employed with 
others observables (magnitudes, colours) or derived properties 
(masses, luminosities, SFRs, etc.). 

The agreement between the two classification cluster defi- 
nitions is very high. "Early" and "late" type galaxies are well 
defined by the spectral, photometric and morphological proper- 
ties, both considering them in a separate way and then combin- 
ing the classifications (classification cube) and treating them as 
a whole (PCA+UFP cluster analysis). Differences arise in the 
definition of outliers: the classification cube is much more sen- 
sitive to single measurement errors or misclassifications in one 
property than the PCA+UFP cluster analysis, in which possible 
measurement errors are "averaged out" during the process. 

The PCA+UFP analysis has been applied also to the low red- 
shift sample, substituting D4000 and EW [Ou] with EW (Ha). 
PCA+UFP analyses, for the high and the low redshift samples, 
allowed us to behold the downsizing effect taking place in the PC 
spaces: the migration from the blue cloud towards the red clump 
happens at higher redshifts for galaxies of larger mass. The de- 
termination of M cross , the transition mass, is in good agreement 
with other values in literature. 

The green valley objects, as defined with the PCA+UFP 
cluster analysis, represent also a more coherent sample with 
respect to classical colour definitions, having the same overall 
physical properties. Subsequent X-ray and radio analyses could 
help unveil more the nature of these transitional objects. 
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